Risk assessment of substance use disorders based on the human leukocyte antigen (HLA)

Substance use disorders (SUDs) are common and costly conditions that are partially attributable to genetic factors. In light of immune system influences on neural and behavioral aspects of addiction, the present study evaluated the influence of genes involved in the human immune response, human leukocyte antigen (HLA), on SUDs. We used an immunogenetic epidemiological approach to evaluate associations between the population frequencies of 127 HLA alleles and the population prevalences of six SUDs (alcohol, amphetamine, cannabis, cocaine, opioid, and “other” dependence) in 14 countries of Continental Western Europe to identify immunogenetic profiles of each SUD and evaluate their associations. The findings revealed two primary groupings of SUDs based on their immunogenetic profiles: one group comprised cannabis and cocaine, whereas the other group comprised alcohol, amphetamines, opioids, and “other” dependence. Since each individual possesses 12 HLA alleles, the population HLA-SUD scores were subsequently used to estimate individual risk for each SUD. Overall, the findings highlight similarities and differences in immunogenetic profiles of SUDs that may influence the prevalence and co-occurrence of problematic SUDs and may contribute to assessment of SUD risk of an individual on the basis of their HLA genetic makeup.

Substance use disorders (SUDs) are common worldwide, resulting in significant health and economic costs 1 . It is well established that both genetic and environmental influences shape substance use 2 , with 50-60% of SUD risk attributed to heritable contributions 3 . Although specific genetic influences on alcohol use disorders have been widely investigated and documented, genetic influences on other SUDs such as cocaine, opioids, and cannabis are limited despite their high prevalences, leading researchers to highlight the need for additional research on the genetics of those SUDs 4 . Concurrently, in light of evidence demonstrating immune system influences on neural and behavioral aspects of addiction 5 , there has been increasing emphasis on immunological and psychoneuroimmunological aspects of SUDs 5,6 . Here, we bridge those lines of research by evaluating the immunogenetics of six SUDs-namely, alcohol, amphetamine, cannabis, cocaine, opioid, and "other" dependence (a residual category including hallucinogens, inhalants, sedatives, and solvent dependence)-according to their human leukocyte antigen (HLA) profiles.
The HLA region of chromosome 6 codes for two classical types of cell surface proteins that are instrumental in immune system surveillance and elimination of non-self antigens. Class I HLA molecules (HLA-A, B, C) bind and export small peptides from proteolytically degraded cytosolic foreign antigens to the cell surface for presentation to CD8+ cytotoxic T cells, signaling cell destruction. Class II HLA molecules (HLA-DPB1, DQB1, DRB1) present larger peptides derived from endocytosed exogenous antigens to CD4+ T cells to facilitate B cell mediated antibody production and adaptive immunity. The HLA region is the most highly polymorphic region of the human genome 7 , and variation in HLA has been shown to contribute to variation in disease susceptibility 8 . HLA-disease associations have been most widely established for autoimmune disorders 9 ; however, HLA associations have been increasingly documented for diseases not traditionally characterized primarily by immune system dysregulation including various psychiatric conditions 10 . With regard to SUDs, HLA has been implicated as an important genetic factor associated with alcohol dependence 11,12 and alcohol-related liver disease 13 , although methodological limitations have rendered findings of HLA associations with alcohol dependence across studies largely inconsistent 14 . HLA associations with other SUDs have received modest attention in humans (c.f., ref. 15 ); however, major histocompatibility complex class I (MHCI; HLA Class I equivalent) expression in dopaminergic www.nature.com/scientificreports/ neurons has been shown to play a key role in suppressing reward-seeking behavior related to cocaine use in mice 16 , and morphine administration in rats has been shown to suppress MHCII (HLA Class II equivalent) expression 17 , highlighting the interactions between addictive substances and immunogenetics. Here, we used an immunogenetic epidemiological approach to evaluate associations between the population frequencies of a large number of HLA alleles and the population prevalences of SUDs in Continental Western Europe to begin to elucidate immunogenetic profiles for SUDs (SUD HLA ). Furthermore, since SUDs frequently co-occur 18 , we evaluated associations between SUD HLA profiles to identify immunogenetic influences underlying their co-occurrence. Finally, since each individual carries 12 HLA alleles (two alleles per HLA gene), we used the population scores to estimate individual SUD risk.

Results
Immunogenetics profiles. The immunogenetic scores of the 6 SUDs and the alleles of 6 classical HLA genes A, B, C, DPB1, DQB1, and DRB1 (127 alleles in total) are given in Tables 1, 2, 3, 4, 5, 6, respectively, their frequency distributions are plotted in Fig. 1, and their descriptive statistics are given in Table 7. In the permutation test, where SUD prevalences were randomly paired with HLA allele frequencies, not a single case (out of 1,000,000 runs) was found to match the observed SUD HLA profiles of any of the 6 SUDs, thus rejecting the null hypothesis that the observed profile could be accounted for by chance (P < 1 × 10 -6 ). The same results were obtained in the ranks version of the random permutations test, which relaxed the requirement of an exact SUD HLA match and focused instead on a match of the ranked SUD HLA scores: no cases of an exact match was found, thus rejecting again the null hypothesis that the SUD HLA profiles could be accounted for by chance (P < 10 -6 ). Therefore, we analyzed the 6 sets of SUD HLA scores with the following results.
Associations between SUD HLA scores. All 15 pairwise correlations of the SUD HLA scores of the 6 SUDs are given in Table 8. Most notable are the high positive correlations between the cannabis and cocaine scores ( Fig. 2A), and between the opioid and "other" scores ( Fig. 2B).
Factor analysis of SUD HLA scores. The factor analysis yielded 2 components (with eigenvalue > 1) which accounted for 72.9% of the variance (Table 9; Fig. 3A, scree plot). The correlation between the components was very low (r = 0.099). The specific assignment of the 6 SUDs to the 2 factor analysis components was inferred from the factor analysis structure matrix, which provides the correlations between SUD type and factor analysis component (Table 10). It can be seen that alcohol, amphetamine, opioid and "other" SUDs were primarily associated with Component 1, whereas cannabis and cocaine use disorders were primarily associated with Component 2. This is illustrated in the component plot of Fig. 3B, where it can be seen that alcohol, amphetamine, opioid and other disorders project at high values on Component 1, whereas cannabis and cocaine use disorders project highly on Component 2.
Application to individuals: assessment of SUD risk based on the individual's whole HLA profile. Since each individual carries a total of 12 HLA alleles (2 per 6 classical HLA genes), we used the average www.nature.com/scientificreports/ (τ, Eq. 2) of the 12 SUD HLA scores of an individual as an estimate of the risk of that individual for a particular SUD. In order to be able to interpret this risk measure, we standardized τ with respect to a large simulated population by generating, for each SUD, a large sample of expected τ* values (N = 1,000,000) using a bootstrap procedure, where τ* was the sum of 12 randomly selected SUD HLA scores (2 per gene). The resulting frequency distributions of τ* were unimodal, approximating a normal distribution; descriptive statistics of these distributions of the 6 SUD τ* values are given in Table 11. The Pearson correlations between the 6 SUD τ* distributions were very similar to those of SUD HLA (Table 8) and are given in Table 12. Similarly, the same factor analysis applied to the τ* distributions yielded the same number of components (Table 13) and component structure matrix (Table 14) as the distributions of SUD HLA scores (Tables 9 and 10). The τ* distribution for alcohol is shown in Fig. 4. The red line is at the level of mean + 2 SD, thus providing an estimated threshold of excessive alcohol SUD risk, along the rationale of using T score in estimating risk related to bone density. We employed a similar approach here and used the z-score of the τ* distribution as a continuously varying risk score. The relevant computations are shown in Table 15, where T thresholds for excessive risk (> mean + 2SD) are given for each SUD. For a given individual, the only information needed to compute their T score is the set of the 12 HLA alleles the individual carries: then, using Tables 1, 2, 3, 4, 5, 6, the average τ score is calculated and its z-score (for a particular SUD) is computed and referred to the threshold(s) in Table 15 for assessment of the risk. As an applied exercise, we calculated the T scores for the best (lowest SUD HLA ) and worst (highest SUD HLA ) cases by averaging the 2 smallest (for the former case) and the 2 largest (for the latter case) SUD HLA scores from each one of the 6 genes, yielding the τ min and τ max , respectively. The relevant data, τ, and T values are given in Tables 16 and 17, for lowest and highest risk assessments.

Discussion
Here we used an epidemiological approach to evaluate the immunogenetic profiles of 6 SUDs and their associations and to estimate individual SUD risk. We documented robust immunogenetic associations between SUDs at both the population and individual level characterized by two groupings-one comprised solely of cannabis  www.nature.com/scientificreports/ and cocaine dependence and the other of alcohol, amphetamine, opioid, and other dependence. These findings, which provide novel evidence of immunogenetic associations with SUDs, are discussed below.
Relatively few studies have focused on HLA-SUD associations, and most previous HLA-SUD association studies have been limited to alcohol use disorders. Several studies in 1980s documented HLA associations with alcohol use disorders and sequelae, although findings across studies were inconsistent 14 and that line of research  www.nature.com/scientificreports/ subsequently dwindled; however, recent advances in genetic association studies have renewed interest in the role of HLA in alcohol use disorders. To that end, a recent candidate gene association study identified several single nucleotide polymorphisms (SNPs) in the HLA-DRA gene that were associated with alcohol dependence 12 , and epigenetic changes of several genes related to inflammation and immune system regulation including HLA have been reported among those with alcohol use disorders 19 . Our findings suggest that HLA-SUD associations extend beyond alcohol to other addictive substances and highlight immunogenetic groupings among SUDs. We found evidence of two distinct groups of SUDs based on their immunogenetic profiles. Previous research evaluating genetic and environmental risk for SUDs in twins identified two genetic factors for SUDs that, remarkably, correspond with findings from the present study 20 . Specifically, similar to our findings, Kendler et al. 20 found that cocaine and cannabis loaded onto 1 genetic factor, whereas other SUDs (licit substances including alcohol, nicotine, and caffeine dependence) loaded onto a separate, albeit highly intercorrelated, genetic factor.     20 . Two large twin studies of illicit drugs found predominantly common genetic risk shared across illicit SUDs 21,22 , with modest specific genetic influences on risk for some drugs 22 . Our findings, which utilize a different approach based on population immunogenetics, extend the literature by evaluating the influence of HLA genes on population and individual risk for SUDs, and document that genes involved in the immune response to foreign antigens are associated with two clusters of SUDs based on immunogenetic profiles.  Table 15. T-score calculation for estimated individual SUD risk. N = 1,000,000 per SUD*. See text for details.

Mean SUD τ* SD T-score = Risk Higher risk threshold (mean + 2SD)
Alcohol 0.01282 0.088216  Table 16. Lowest risk HLA genotypes for the 6 SUD studied. The 2 alleles for each gene/SUD combination have the 2 most negative HLA-SUD scores ( Table 2). τ is the average of the scores of the 12 alleles in a column. T is the z-score of τ. www.nature.com/scientificreports/ Cannabis and cocaine dependence formed one HLA-based SUD group that was distinguished from the other grouping containing all four of the other SUDs. It is notable that cannabis and cocaine SUD HLA profiles were very highly correlated both at the population level and for individual SUD risk. Their correspondence was further reflected in analyses identifying the 2 alleles (out of 127) associated with the highest risk of each SUD. For cannabis and cocaine, the high risk alleles were virtually identical (9 out of 12 alleles) and minimally overlapped with high risk HLA alleles for SUDs in the other group (Table 12). In fact, in some cases the genes associated with the highest risk for an SUD from one group were associated with the lowest risk for an SUD in the other group as exemplified by A*02:01 conferring high risk for alcohol and low risk for cocaine (Table 11). Prior studies reviewed elsewhere 23 have highlighted links between cannabis and cocaine including similar neuropharmacological actions of cannabinoids and cocaine, and endocannabinoid system involvement in cocaine addiction. In contrast, cannabidiol has been shown to inhibit the reward-facilitating effect of opioids and other substances that were part of the second cluster in the present study 24,25 . Beyond differences in brain reward effects, the separate clustering of cocaine and cannabis from the other SUDs investigated here documents immunogenetic differences between the two clusters of SUDs. Similar to the cannabis-cocaine grouping, the finding that alcohol, amphetamines, opioids, and "Other" dependencies clustered together suggests common HLA associations amongst those SUDs that differ from the cannabis-cocaine cluster. Indeed, the SUD HLA correlations among alcohol, amphetamine, opioid, and "Other" were considerably stronger than their associations with cannabis or cocaine. Taken together, the present findings highlight similarities and difference in the immunogenetic profiles of SUDs. The HLA-SUD associations documented here are particularly interesting in light of research on immunotherapies for treatment of addictions including several Phase I and II clinical trials evaluating the effectiveness of anti-addiction vaccines and antibodies aimed at preventing drugs from reaching the brain and activating reward centers 26 .
To our knowledge, this is the first study to evaluate immunogenetic profiles of SUDs and their associations. The findings of this immunogenetic epidemiological study provide novel insights regarding HLA-SUD associations; however, the findings must be considered within the context of study limitations. First, this was an epidemiological study. We utilized the population level SUD HLA scores to estimate individual risk although future studies are warranted to determine whether the individual risk estimates are corroborated in vivo. Second, the current study focused on Continental Western Europe. Since geographic and ethnic variation in HLA are well-established 27,28 and SUD prevalence varies globally 1 , the HLA-SUDs associations identified here may vary in other regions. An additional consideration involves reporting of illicit substances. Some individuals may be hesitant to disclose problematic substance use which may impact estimates; however, potential reporting biases are somewhat mitigated by the fact that population estimates of SUDs used in the present analyses were obtained from the Global Burden of Disease study which is the most comprehensive epidemiological study of diseases including SUDs. Finally, many other genetic and environmental factors not investigated here contribute to SUDs; how those factors interact with HLA to influence SUD prevalence remains to be investigated.

Materials and methods
Epidemiological data. Prevalence of substance use disorders. The population prevalence of alcohol use disorder, amphetamine use disorder, cannabis use disorder, cocaine use disorder, opioid use disorder, and other drug use disorders in 2019 was computed for each of the following 14 countries in Continental Western Europe (CWE): Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Netherlands, Portugal, Norway, Spain, Sweden, and Switzerland. Specifically, the total number of people with each SUD in each of the 14 CWE countries was identified from the Global Health Data Exchange 29 , a publicly available catalog of data from the Global Burden of Disease study, the most comprehensive worldwide epidemiological study of more than 350 dis- Table 17. Highest risk HLA genotypes for the 6 SUD studied. The 2 alleles for each gene/SUD combination have the 2 most positive HLA-SUD scores ( Table 2). τ is the average of the scores of the 12 alleles in a column (Eq. 2). T is the z-score of τ (Table 10). www.nature.com/scientificreports/ eases. The number of people with each SUD in each country was divided by the total population of each country in 2016 30 and expressed as a percentage. We have previously shown that life expectancy for these countries is virtually identical 31 ; therefore, life expectancy was not included in the current analyses.
HLA. The frequencies of all reported HLA alleles of classical genes of Class I (A, B, C) and Class II (DPB1, DQB1, DRB1) for each of the 14 CWE countries were retrieved from the website allelefrequencies.net (Estimation of Global Allele Frequencies) 32,33 on October 20, 2020. As we reported previously 31 , there were 844 distinct alleles, i.e. alleles that occurred in at least one country. Of those, 127 alleles occurred in 9 or more countries and were used in further analyses. This criterion is somewhat arbitrary but reasonable; it was partially validated in a previous study 34 .
Data analysis. HLA-SUD profiles. HLA-SUD profiles for each SUD disorder above were derived by computing the covariance between the nonparametric normal scores 35 of the prevalence of a SUD and those of the population frequency of an allele, comprising 69 HLA Class I and 58 Class II alleles, for a total of 127 alleles. The covariance can be negative or positive, indicating a negative or positive association, respectively. The equation for the HLA-SUD score is: where f i , p i denote the normal scores of allele frequency and SUD prevalence for the ith country, respectively, and f , p are their means. Thus a SUD HLA profile is a vector with 127 HLA-SUD scores. It should be noted that covariance is a descriptive measure of interdependence not subject to formal statistical testing and one that has been used routinely for many years routinely and successfully in other fields, including evolutionary biology 36 and finance 37 . Standard statistical methods were used to analyze the HLA-SUD scores, including parametric univariate (mean, standard deviation, etc.), bivariate (Pearson correlation), multivariate (factor analysis), and permutationsbased statistics.
Random permutations test for assessing the statistical significance of the HLA-SUD profiles. In this analysis, we tested the null hypothesis that the HLA-SUD profiles may be due to chance by performing a permutation test, where the pairing of allele frequencies and SUD prevalences was randomly scrambled. More specifically, let H be HLA-SUD profile for a specific SUD, and let H′ be the profile obtained after random pairing of pairing of alleles and countries. If the profiles are identical, the sum S of the absolute paired differences between them ( H ,H′) would be zero. We carried out this procedure 1,000,000 times for each one of the 6 HLA-SUD profiles and counted the number of times M for which the sum S was equal to zero, indicating that the randomly obtained profile would be the same as the observed one. Then, the ratio w = M 1,000,000 is the probability that the observed profile H could be due to chance. In a relaxed variation of the test, we computed the sum of the absolute differences between the ranked profiles.
Factor analysis. A factor analysis (FA) were performed to identify potential groupings ("components") of SUD HLA scores. The method of principal components was used for extraction and the method of direct oblimin (delta = 0) with Kaiser normalization was used for factor rotation.
Application to individuals. Since every individual carries k = 12 classical HLA alleles (2 of each 3 HLA Class I and 3 Class II genes), average SUD HLA scores were calculated: We obtained expected estimates of τ using a bootstrap procedure 38 , as follows. For each HLA gene and SUD, two SUD HLA scores were drawn randomly (with replacement) from the pool of available alleles and were averaged to yield bootstrap values of τ* for a simulated "individual". The procedure was repeated 1 million times for a total of 1,000,000 τ* values which were used for further analyses. The same random seed was used for each draw of the 12 SUD HLA values, such that SUD HLA values for all 6 SUDs referred to the same set of alleles, thus allowing for an assessment of associations between τ* distributions.
Implementation of analysis procedures. The IBM-SPSS statistical package (version 27) was used for implementing standard statistical analyses. All P values reported are 2-sided. The permutation test and bootstrap procedure was implemented using FORTRAN (Geany, version 1.38, built on or after 2021-10-09) and 64-bit Mersenne Twister random number generator with a large random double-precision odd seed.